Overview

Dataset statistics

Number of variables12
Number of observations4898
Missing cells5376
Missing cells (%)9.1%
Duplicate rows118
Duplicate rows (%)2.4%
Total size in memory459.3 KiB
Average record size in memory96.0 B

Variable types

NUM11
CAT1

Warnings

Dataset has 118 (2.4%) duplicate rows Duplicates
fixed acidity has 498 (10.2%) missing values Missing
volatile acidity has 467 (9.5%) missing values Missing
citric acid has 487 (9.9%) missing values Missing
residual sugar has 464 (9.5%) missing values Missing
chlorides has 511 (10.4%) missing values Missing
free sulfur dioxide has 500 (10.2%) missing values Missing
total sulfur dioxide has 469 (9.6%) missing values Missing
density has 485 (9.9%) missing values Missing
pH has 483 (9.9%) missing values Missing
sulphates has 526 (10.7%) missing values Missing
alcohol has 486 (9.9%) missing values Missing

Reproduction

Analysis started2020-10-09 15:44:17.202920
Analysis finished2020-10-09 15:44:30.955801
Duration13.75 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

fixed acidity
Real number (ℝ≥0)

MISSING

Distinct67
Distinct (%)1.5%
Missing498
Missing (%)10.2%
Infinite0
Infinite (%)0.0%
Mean6.849079545
Minimum3.8
Maximum11.8
Zeros0
Zeros (%)0.0%
Memory size38.3 KiB
2020-10-09T16:44:31.019939image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum3.8
5-th percentile5.6
Q16.3
median6.8
Q37.3
95-th percentile8.3
Maximum11.8
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.8358749443
Coefficient of variation (CV)0.1220419384
Kurtosis1.113152139
Mean6.849079545
Median Absolute Deviation (MAD)0.5
Skewness0.5085564094
Sum30135.95
Variance0.6986869225
MonotocityNot monotonic
2020-10-09T16:44:31.137552image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
6.82765.6%
 
6.62625.3%
 
6.42525.1%
 
6.72154.4%
 
6.92124.3%
 
72114.3%
 
6.52104.3%
 
7.21903.9%
 
7.41773.6%
 
6.21763.6%
 
Other values (57)221945.3%
 
(Missing)49810.2%
 
ValueCountFrequency (%) 
3.81< 0.1%
 
3.91< 0.1%
 
4.22< 0.1%
 
4.430.1%
 
4.51< 0.1%
 
ValueCountFrequency (%) 
11.81< 0.1%
 
10.71< 0.1%
 
10.32< 0.1%
 
10.21< 0.1%
 
102< 0.1%
 

volatile acidity
Real number (ℝ≥0)

MISSING

Distinct119
Distinct (%)2.7%
Missing467
Missing (%)9.5%
Infinite0
Infinite (%)0.0%
Mean0.2778571429
Minimum0.08
Maximum1.1
Zeros0
Zeros (%)0.0%
Memory size38.3 KiB
2020-10-09T16:44:31.251440image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.08
5-th percentile0.15
Q10.21
median0.26
Q30.32
95-th percentile0.46
Maximum1.1
Range1.02
Interquartile range (IQR)0.11

Descriptive statistics

Standard deviation0.1001514976
Coefficient of variation (CV)0.3604424079
Kurtosis5.177265027
Mean0.2778571429
Median Absolute Deviation (MAD)0.06
Skewness1.568396912
Sum1231.185
Variance0.01003032248
MonotocityNot monotonic
2020-10-09T16:44:31.354483image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0.282324.7%
 
0.242294.7%
 
0.262214.5%
 
0.222134.3%
 
0.252134.3%
 
0.271974.0%
 
0.231944.0%
 
0.21944.0%
 
0.31813.7%
 
0.211693.5%
 
Other values (109)238848.8%
 
(Missing)4679.5%
 
ValueCountFrequency (%) 
0.0840.1%
 
0.0851< 0.1%
 
0.150.1%
 
0.10560.1%
 
0.11120.2%
 
ValueCountFrequency (%) 
1.11< 0.1%
 
1.0051< 0.1%
 
0.9651< 0.1%
 
0.931< 0.1%
 
0.9051< 0.1%
 

citric acid
Real number (ℝ≥0)

MISSING

Distinct87
Distinct (%)2.0%
Missing487
Missing (%)9.9%
Infinite0
Infinite (%)0.0%
Mean0.3346701428
Minimum0
Maximum1.66
Zeros17
Zeros (%)0.3%
Memory size38.3 KiB
2020-10-09T16:44:31.461786image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.17
Q10.27
median0.32
Q30.39
95-th percentile0.54
Maximum1.66
Range1.66
Interquartile range (IQR)0.12

Descriptive statistics

Standard deviation0.1222739898
Coefficient of variation (CV)0.3653567324
Kurtosis6.46138061
Mean0.3346701428
Median Absolute Deviation (MAD)0.06
Skewness1.336434192
Sum1476.23
Variance0.01495092858
MonotocityNot monotonic
2020-10-09T16:44:31.573952image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0.32775.7%
 
0.282505.1%
 
0.322344.8%
 
0.342024.1%
 
0.291984.0%
 
0.261964.0%
 
0.271933.9%
 
0.491893.9%
 
0.311773.6%
 
0.241653.4%
 
Other values (77)233047.6%
 
(Missing)4879.9%
 
ValueCountFrequency (%) 
0170.3%
 
0.0170.1%
 
0.0240.1%
 
0.032< 0.1%
 
0.04120.2%
 
ValueCountFrequency (%) 
1.661< 0.1%
 
1.231< 0.1%
 
150.1%
 
0.991< 0.1%
 
0.912< 0.1%
 

residual sugar
Real number (ℝ≥0)

MISSING

Distinct304
Distinct (%)6.9%
Missing464
Missing (%)9.5%
Infinite0
Infinite (%)0.0%
Mean6.393222824
Minimum0.6
Maximum65.8
Zeros0
Zeros (%)0.0%
Memory size38.3 KiB
2020-10-09T16:44:31.696094image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.6
5-th percentile1.1
Q11.7
median5.2
Q39.85
95-th percentile15.8
Maximum65.8
Range65.2
Interquartile range (IQR)8.15

Descriptive statistics

Standard deviation5.086484589
Coefficient of variation (CV)0.7956057109
Kurtosis3.852947643
Mean6.393222824
Median Absolute Deviation (MAD)3.6
Skewness1.117488511
Sum28347.55
Variance25.87232548
MonotocityNot monotonic
2020-10-09T16:44:31.802871image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1.21723.5%
 
1.41633.3%
 
1.61463.0%
 
1.31362.8%
 
1.11352.8%
 
1.51252.6%
 
1.8931.9%
 
1.7881.8%
 
1801.6%
 
2711.4%
 
Other values (294)322565.8%
 
(Missing)4649.5%
 
ValueCountFrequency (%) 
0.62< 0.1%
 
0.750.1%
 
0.8230.5%
 
0.9360.7%
 
0.952< 0.1%
 
ValueCountFrequency (%) 
65.81< 0.1%
 
31.62< 0.1%
 
26.052< 0.1%
 
23.51< 0.1%
 
22.61< 0.1%
 

chlorides
Real number (ℝ≥0)

MISSING

Distinct157
Distinct (%)3.6%
Missing511
Missing (%)10.4%
Infinite0
Infinite (%)0.0%
Mean0.04601800775
Minimum0.009
Maximum0.346
Zeros0
Zeros (%)0.0%
Memory size38.3 KiB
2020-10-09T16:44:31.916455image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.009
5-th percentile0.027
Q10.036
median0.043
Q30.05
95-th percentile0.0677
Maximum0.346
Range0.337
Interquartile range (IQR)0.014

Descriptive statistics

Standard deviation0.02239629583
Coefficient of variation (CV)0.4866854722
Kurtosis37.06569
Mean0.04601800775
Median Absolute Deviation (MAD)0.007
Skewness5.034741483
Sum201.881
Variance0.0005015940669
MonotocityNot monotonic
2020-10-09T16:44:32.031525image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0.0441763.6%
 
0.0361733.5%
 
0.0461623.3%
 
0.0481593.2%
 
0.041593.2%
 
0.0451573.2%
 
0.0421553.2%
 
0.0471553.2%
 
0.0341523.1%
 
0.0381493.0%
 
Other values (147)279057.0%
 
(Missing)51110.4%
 
ValueCountFrequency (%) 
0.0091< 0.1%
 
0.0121< 0.1%
 
0.01430.1%
 
0.01540.1%
 
0.01650.1%
 
ValueCountFrequency (%) 
0.3461< 0.1%
 
0.3011< 0.1%
 
0.291< 0.1%
 
0.2711< 0.1%
 
0.2551< 0.1%
 

free sulfur dioxide
Real number (ℝ≥0)

MISSING

Distinct129
Distinct (%)2.9%
Missing500
Missing (%)10.2%
Infinite0
Infinite (%)0.0%
Mean35.39483856
Minimum2
Maximum289
Zeros0
Zeros (%)0.0%
Memory size38.3 KiB
2020-10-09T16:44:32.151385image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile11
Q123
median34
Q346
95-th percentile63
Maximum289
Range287
Interquartile range (IQR)23

Descriptive statistics

Standard deviation17.09292196
Coefficient of variation (CV)0.4829213143
Kurtosis12.17592225
Mean35.39483856
Median Absolute Deviation (MAD)11
Skewness1.448020763
Sum155666.5
Variance292.1679811
MonotocityNot monotonic
2020-10-09T16:44:32.255525image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
291372.8%
 
261172.4%
 
351162.4%
 
341152.3%
 
311152.3%
 
361082.2%
 
241062.2%
 
331012.1%
 
251002.0%
 
371002.0%
 
Other values (119)328367.0%
 
(Missing)50010.2%
 
ValueCountFrequency (%) 
21< 0.1%
 
390.2%
 
490.2%
 
5190.4%
 
6260.5%
 
ValueCountFrequency (%) 
2891< 0.1%
 
146.51< 0.1%
 
1311< 0.1%
 
1281< 0.1%
 
1241< 0.1%
 

total sulfur dioxide
Real number (ℝ≥0)

MISSING

Distinct247
Distinct (%)5.6%
Missing469
Missing (%)9.6%
Infinite0
Infinite (%)0.0%
Mean137.7562655
Minimum9
Maximum440
Zeros0
Zeros (%)0.0%
Memory size38.3 KiB
2020-10-09T16:44:32.366521image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum9
5-th percentile75
Q1108
median134
Q3166
95-th percentile210
Maximum440
Range431
Interquartile range (IQR)58

Descriptive statistics

Standard deviation42.07877001
Coefficient of variation (CV)0.3054581209
Kurtosis0.5829306493
Mean137.7562655
Median Absolute Deviation (MAD)29
Skewness0.3837512055
Sum610122.5
Variance1770.622886
MonotocityNot monotonic
2020-10-09T16:44:32.471753image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
111601.2%
 
117531.1%
 
150531.1%
 
122511.0%
 
118501.0%
 
140501.0%
 
128501.0%
 
113491.0%
 
114481.0%
 
156471.0%
 
Other values (237)391880.0%
 
(Missing)4699.6%
 
ValueCountFrequency (%) 
91< 0.1%
 
182< 0.1%
 
191< 0.1%
 
211< 0.1%
 
242< 0.1%
 
ValueCountFrequency (%) 
4401< 0.1%
 
366.51< 0.1%
 
307.51< 0.1%
 
3031< 0.1%
 
2941< 0.1%
 

density
Real number (ℝ≥0)

MISSING

Distinct869
Distinct (%)19.7%
Missing485
Missing (%)9.9%
Infinite0
Infinite (%)0.0%
Mean0.9940384194
Minimum0.98711
Maximum1.03898
Zeros0
Zeros (%)0.0%
Memory size38.3 KiB
2020-10-09T16:44:32.578233image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.98711
5-th percentile0.98963
Q10.99174
median0.9938
Q30.99612
95-th percentile0.999008
Maximum1.03898
Range0.05187
Interquartile range (IQR)0.00438

Descriptive statistics

Standard deviation0.003013857396
Coefficient of variation (CV)0.003031932505
Kurtosis10.59304292
Mean0.9940384194
Median Absolute Deviation (MAD)0.00215
Skewness1.040244148
Sum4386.691545
Variance9.083336401e-06
MonotocityNot monotonic
2020-10-09T16:44:32.690381image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0.992571.2%
 
0.9928551.1%
 
0.9932501.0%
 
0.993471.0%
 
0.9934471.0%
 
0.9938450.9%
 
0.9944420.9%
 
0.9927410.8%
 
0.9948400.8%
 
0.9924400.8%
 
Other values (859)394980.6%
 
(Missing)4859.9%
 
ValueCountFrequency (%) 
0.987111< 0.1%
 
0.987131< 0.1%
 
0.987221< 0.1%
 
0.98741< 0.1%
 
0.987422< 0.1%
 
ValueCountFrequency (%) 
1.038981< 0.1%
 
1.01032< 0.1%
 
1.002952< 0.1%
 
1.002411< 0.1%
 
1.00241< 0.1%
 

pH
Real number (ℝ≥0)

MISSING

Distinct101
Distinct (%)2.3%
Missing483
Missing (%)9.9%
Infinite0
Infinite (%)0.0%
Mean3.188061155
Minimum2.74
Maximum3.82
Zeros0
Zeros (%)0.0%
Memory size38.3 KiB
2020-10-09T16:44:32.801131image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum2.74
5-th percentile2.96
Q13.08
median3.18
Q33.28
95-th percentile3.46
Maximum3.82
Range1.08
Interquartile range (IQR)0.2

Descriptive statistics

Standard deviation0.1516803655
Coefficient of variation (CV)0.04757762104
Kurtosis0.5643898337
Mean3.188061155
Median Absolute Deviation (MAD)0.1
Skewness0.483780226
Sum14075.29
Variance0.02300693328
MonotocityNot monotonic
2020-10-09T16:44:32.921588image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
3.141563.2%
 
3.161513.1%
 
3.221352.8%
 
3.191322.7%
 
3.241242.5%
 
3.081242.5%
 
3.151222.5%
 
3.181212.5%
 
3.121202.4%
 
3.21202.4%
 
Other values (91)311063.5%
 
(Missing)4839.9%
 
ValueCountFrequency (%) 
2.741< 0.1%
 
2.771< 0.1%
 
2.792< 0.1%
 
2.830.1%
 
2.8340.1%
 
ValueCountFrequency (%) 
3.821< 0.1%
 
3.811< 0.1%
 
3.82< 0.1%
 
3.791< 0.1%
 
3.772< 0.1%
 

sulphates
Real number (ℝ≥0)

MISSING

Distinct78
Distinct (%)1.8%
Missing526
Missing (%)10.7%
Infinite0
Infinite (%)0.0%
Mean0.489878774
Minimum0.22
Maximum1.08
Zeros0
Zeros (%)0.0%
Memory size38.3 KiB
2020-10-09T16:44:33.037084image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.22
5-th percentile0.34
Q10.41
median0.47
Q30.55
95-th percentile0.7045
Maximum1.08
Range0.86
Interquartile range (IQR)0.14

Descriptive statistics

Standard deviation0.1143431759
Coefficient of variation (CV)0.2334111661
Kurtosis1.642538891
Mean0.489878774
Median Absolute Deviation (MAD)0.07
Skewness0.9843199583
Sum2141.75
Variance0.01307436187
MonotocityNot monotonic
2020-10-09T16:44:33.142284image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0.52284.7%
 
0.462054.2%
 
0.441954.0%
 
0.381933.9%
 
0.421623.3%
 
0.471563.2%
 
0.491533.1%
 
0.541533.1%
 
0.41523.1%
 
0.481513.1%
 
Other values (68)262453.6%
 
(Missing)52610.7%
 
ValueCountFrequency (%) 
0.221< 0.1%
 
0.231< 0.1%
 
0.2540.1%
 
0.2640.1%
 
0.27110.2%
 
ValueCountFrequency (%) 
1.081< 0.1%
 
1.061< 0.1%
 
1.011< 0.1%
 
11< 0.1%
 
0.991< 0.1%
 

alcohol
Real number (ℝ≥0)

MISSING

Distinct102
Distinct (%)2.3%
Missing486
Missing (%)9.9%
Infinite0
Infinite (%)0.0%
Mean10.51835071
Minimum8
Maximum14.2
Zeros0
Zeros (%)0.0%
Memory size38.3 KiB
2020-10-09T16:44:33.249631image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum8
5-th percentile8.9
Q19.5
median10.4
Q311.4
95-th percentile12.7
Maximum14.2
Range6.2
Interquartile range (IQR)1.9

Descriptive statistics

Standard deviation1.234729584
Coefficient of variation (CV)0.117388136
Kurtosis-0.7095914483
Mean10.51835071
Median Absolute Deviation (MAD)1
Skewness0.4852554398
Sum46406.96333
Variance1.524557145
MonotocityNot monotonic
2020-10-09T16:44:33.352348image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
9.42024.1%
 
9.52014.1%
 
9.21893.9%
 
91693.5%
 
111463.0%
 
10.51422.9%
 
101422.9%
 
10.41402.9%
 
9.11332.7%
 
9.81232.5%
 
Other values (92)282557.7%
 
(Missing)4869.9%
 
ValueCountFrequency (%) 
82< 0.1%
 
8.430.1%
 
8.530.1%
 
8.6200.4%
 
8.7701.4%
 
ValueCountFrequency (%) 
14.21< 0.1%
 
14.051< 0.1%
 
1450.1%
 
13.92< 0.1%
 
13.81< 0.1%
 

quality
Categorical

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size38.3 KiB
6.0
2198 
5.0
1457 
7.0
880 
8.0
 
175
4.0
 
163
Other values (2)
 
25
ValueCountFrequency (%) 
6.0219844.9%
 
5.0145729.7%
 
7.088018.0%
 
8.01753.6%
 
4.01633.3%
 
3.0200.4%
 
9.050.1%
 
2020-10-09T16:44:33.455581image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-10-09T16:44:33.515725image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:33.602901image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length3
Median length3
Mean length3
Min length3

Interactions

2020-10-09T16:44:19.126107image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:19.230626image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:19.321549image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:19.414581image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:19.507598image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:19.600857image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:19.694941image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:19.782445image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:19.872076image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:19.967763image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:20.074262image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:20.180262image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:20.268969image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:20.352340image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:20.435287image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:20.520133image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:20.606101image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:20.694673image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:20.776204image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:20.858722image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:20.942408image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:21.032522image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:21.116800image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:21.206826image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:21.292284image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:21.379339image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:21.466552image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:21.554342image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:21.642617image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:21.723487image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:21.809593image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:21.895687image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:21.987936image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:22.075940image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:22.167195image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:22.256031image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:22.343446image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:22.432842image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:22.522936image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:22.613338image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:22.699966image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:22.787575image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:22.875891image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:22.971091image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:23.060305image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:23.153953image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:23.245170image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:23.335790image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:23.428320image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:23.519717image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:23.610600image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:23.695609image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:23.783372image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:23.872436image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:23.967847image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:24.059211image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:24.151344image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:24.238946image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:24.327389image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:24.417143image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:24.508467image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:24.599233image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:24.683424image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:24.771014image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:24.861072image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:24.957996image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:25.048556image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:25.131523image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:25.209125image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:25.289020image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:25.370826image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:25.453629image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:25.535953image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:25.612011image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:25.690588image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:25.771159image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:25.857496image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:25.938483image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:26.026419image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:26.462414image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:26.547594image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:26.632790image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:26.719348image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:26.805332image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:26.885252image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:26.966376image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:27.049356image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:27.140101image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:27.225462image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:27.315093image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:27.403004image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:27.488742image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:27.578135image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:27.665664image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:27.757160image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:27.838954image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:27.923033image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:28.008050image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:28.104707image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:28.192015image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:28.294155image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:28.386541image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:28.481185image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:28.578137image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:28.677033image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:28.775805image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:28.867208image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:28.960522image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:29.059520image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:29.164374image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:29.261913image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:29.352858image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:29.439640image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:29.527245image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:29.615987image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:29.705971image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:29.796781image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:29.881768image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:29.967245image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:30.054065image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:30.148461image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2020-10-09T16:44:33.680003image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-10-09T16:44:33.826599image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-10-09T16:44:33.971763image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-10-09T16:44:34.118306image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-10-09T16:44:30.320987image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:30.556668image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:30.744906image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-09T16:44:30.863657image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

fixed acidityvolatile aciditycitric acidresidual sugarchloridesfree sulfur dioxidetotal sulfur dioxidedensitypHsulphatesalcoholquality
0NaN0.270.3620.70.04545.0170.01.00103.00NaN8.86.0
16.30.300.341.60.04914.0132.00.99403.300.499.56.0
28.10.280.406.90.05030.097.00.99513.260.44NaN6.0
37.20.230.328.50.05847.0186.00.99563.19NaN9.96.0
47.2NaN0.328.50.05847.0186.00.99563.190.40NaN6.0
58.10.280.40NaN0.05030.097.00.99513.260.4410.16.0
66.20.320.167.00.04530.0136.00.99493.180.479.66.0
77.00.270.3620.70.04545.0170.01.00103.000.458.86.0
86.30.300.341.60.049NaN132.00.99403.300.499.56.0
98.10.220.431.50.04428.0129.00.99383.220.4511.06.0

Last rows

fixed acidityvolatile aciditycitric acidresidual sugarchloridesfree sulfur dioxidetotal sulfur dioxidedensitypHsulphatesalcoholquality
48886.80.2200.361.200.052NaN127.00.993303.040.549.25.0
48894.90.2350.2711.750.03034.0118.00.995403.070.509.46.0
48906.10.3400.292.200.03625.0100.00.989383.060.44NaN6.0
48915.70.2100.320.900.03838.0121.00.99074NaNNaN10.66.0
48926.50.2300.38NaN0.03229.0112.00.992983.290.549.75.0
48936.20.2100.291.600.03924.092.00.991143.270.50NaN6.0
48946.60.3200.368.00NaN57.0168.00.994903.150.469.65.0
48956.5NaN0.191.200.04130.0111.00.992542.990.469.46.0
48965.50.2900.301.100.02220.0110.00.988693.340.3812.87.0
48976.00.2100.380.800.02022.098.00.989413.260.3211.86.0

Duplicate rows

Most frequent

fixed acidityvolatile aciditycitric acidresidual sugarchloridesfree sulfur dioxidetotal sulfur dioxidedensitypHsulphatesalcoholqualitycount
667.30.190.2713.90.05745.0155.00.998072.940.418.88.05
717.40.160.3013.70.05633.0168.00.998252.900.448.77.05
637.20.250.2814.40.05555.0205.00.998603.120.389.07.04
777.60.200.3014.20.05653.0212.50.999003.140.468.98.04
96.20.220.282.20.04024.0125.00.991703.190.4810.56.03
146.30.130.421.10.04363.0146.00.990663.130.7211.27.03
166.40.240.268.20.05447.0182.00.995383.120.509.55.03
206.50.180.4114.20.03947.0129.00.996783.280.7210.37.03
487.00.150.2814.70.05129.0149.00.997922.960.399.07.03
827.70.300.4214.30.04545.0213.00.999103.180.639.25.03